Background Information: We were tasked by Kirk Bogard, the Associate Vice President for Development and External Relations at Miami University to explore a dataset of real student data in order to find relationships and patterns that he can use to give Miami a competitive advantage. After exploring the data, we found a few particular variables that can help us find a potential relationship in the dataset, survey_salary, survery_internships, and survey_state. We plan to build a regression model using the number of internships during college to predict salary after graduation, using state to control for salary. The purpose of this analysis is to provide information on the relationship between number of internships and salary information to FSB Career Services. This will help them give more accurate guidance to students to ensure they get the best full time opportunity for them.
Survey Overview: Usable Response %: Our initial dataset included information on much more than just internship and salary information, and as such there were observations that provided no meaningful information to our model. After removing these observations (e.g. those without a reported salary after graduation, unreported number of internships, unreported starting location), the dataset shrunk to a bit over 50% of the original size. Out of 3235 original observations, we were left with little over 1700 to work with.
Distribution of Number of Internships: To understand the data a bit better, we wanted to explore how many internships students were completing before going into the job force. According to this basic histogram, most students completed 1 or 2 internships, followed by 3, while 0,4, and 5 had fairly few instances. The distribution looks relatively normal, but seems to be skewed slightly right.
Internship Effects on Salary: Mean Salary by Number of Internships: This is a bar chart depicting the average salary after graduation grouped by the number of internships they completed. What we see from this graph is that salary increases a bit for each additional internship completed up to 3 internships, after which average salary levels out. We expected salary to generally increase after an additional internship is completed, and the chart seems to support this. However, it is interesting to see that having additional internships after completing 3 does not seem to have an effect on salary, even lowering expected salary with 5 internships (potentially due to there being so few observations with 5 internships). This could indicate you do not need more than 3 internships if you want to maximize salary after graduation.
Regression Model Predicting Salary by Number of Internships:
Using a regression model we predicted salary by number of internships using 0 internships as a baseline.
Having 1 internship increases the predicted salary by $5243 compared to 0 internships
Having 2 internships increases the predicted salary by $7775 compared to 0 internships
Having 3 internships increases the predicted salary by $9873 compared to 0 internships
Having 4 internships increases the predicted salary by $9894 compared to 0 internships
Having 5 internships increases the predicted salary by $9388 compared to 0 internships
There is a sizable increase in salary when adding another internship up until 3 internships, 4 internships is about the same as 3 internships and 5 internships even drops the predicted salary.
Overview of survey responses
rate = round(100 * nrow(df)/nrow(df2),0)
gauge(rate, min=0, max=100, symbol='%', gaugeSectors(
success=c(80,100), warning= c(40,79), danger=c(0,39)
)
)
Call:
lm(formula = df$survey_salary ~ df$survey_internships)
Residuals:
Min 1Q Median 3Q Max
-51384 -6755 327 5745 115713
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 51591.1 974.2 52.956 < 2e-16 ***
df$survey_internships1 5163.6 1089.6 4.739 2.32e-06 ***
df$survey_internships2 7696.0 1069.1 7.199 9.05e-13 ***
df$survey_internships3 9793.1 1223.7 8.003 2.21e-15 ***
df$survey_internships4 9929.2 2235.3 4.442 9.48e-06 ***
df$survey_internships5 9308.9 4260.5 2.185 0.029 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 11730 on 1720 degrees of freedom
Multiple R-squared: 0.04577, Adjusted R-squared: 0.04299
F-statistic: 16.5 on 5 and 1720 DF, p-value: 6.139e-16